Machine translation evaluation inside QARLA
نویسندگان
چکیده
In this work we present the fundamentals of the IQMT framework for MT evaluation. IQMT offers a common workbench on which existing evaluation metrics can be utilized. We suggest the IQ measure and test it on the Chinese-toEnglish data from the IWSLT 2004 Evaluation Campaign. We show how the correlation with human assessments at the system level improves substantially for most individual metrics. Moreover, IQMT allows to robustly combine several metrics avoiding scaling problems and metric weightings. Several metric combinations were tried, but correlations did not further improve significantly.
منابع مشابه
IQmt: A Framework for Automatic Machine Translation Evaluation
Abstract We present the IQMT Framework for Machine Translation Evaluation Inside QARLA. IQMT offers a common workbench in which evaluation metrics can be utilized and combined. It provides i) a measure to evaluate the quality of any set of similarity metrics (KING), ii) a measure to evaluate the quality of a translation using a set of similarity metrics (QUEEN), and iii) a measure to evaluate t...
متن کاملMT Evaluation: Human-Like vs. Human Acceptable
We present a comparative study on Machine Translation Evaluation according to two different criteria: Human Likeness and Human Acceptability. We provide empirical evidence that there is a relationship between these two kinds of evaluation: Human Likeness implies Human Acceptability but the reverse is not true. From the point of view of automatic evaluation this implies that metrics based on Hum...
متن کاملEvaluating DUC 2004 Tasks with the QARLA Framework
This papers reports the application of the QARLA evaluation framework to the DUC 2004 testbed (tasks 2 and 5). Our experiment addresses two issues: how well QARLA evaluation measures correlate with human judgements, and what additional insights can be provided by the QARLA framework to the DUC evaluation exercises.
متن کاملThe Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language
Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...
متن کاملEvaluación de resúmenes automáticos mediante QARLA
This article shows an application of the QARLA evaluation framework on DUC-2004 (tasks 2 and 5). The QARLA framework allows to evaluate summaries with regard to different features. Second, it allows to combine and meta-evaluate different similarity metrics, giving more weigh to metrics which characterize models (manual summaries) regarding automatic summaries.
متن کامل